Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

Analyze lctl output for real snapshot listing failures #2282

Closed
wants to merge 1 commit into from

Conversation

ip1981
Copy link
Member

@ip1981 ip1981 commented Sep 29, 2020

Closing #2248.

Signed-off-by: Igor Pashev pashev.igor@gmail.com


This change is Reviewable

@ip1981 ip1981 added the bug label Sep 29, 2020
@ip1981 ip1981 requested a review from a team September 29, 2020 13:50
@ip1981 ip1981 self-assigned this Sep 29, 2020
@ip1981 ip1981 marked this pull request as ready for review September 29, 2020 18:30
@ip1981
Copy link
Member Author

ip1981 commented Sep 30, 2020

It may be a good idea to catch "resource temporarily unavailable" at lctl call here:

pub async fn lctl<I, S>(args: I) -> Result<String, ImlAgentError>
where
I: IntoIterator<Item = S>,
S: AsRef<OsStr>,
{
Command::new("/usr/sbin/lctl")
.args(args)
.checked_output()
.err_into()
.await
.map(|o| String::from_utf8_lossy(&o.stdout).to_string())
}

and then retry.

So all command would work more reliably.

@ip1981 ip1981 marked this pull request as draft October 6, 2020 17:24
@ip1981 ip1981 marked this pull request as ready for review October 19, 2020 13:34
@ip1981 ip1981 requested a review from a team October 19, 2020 17:05
@jgrund jgrund self-requested a review October 20, 2020 19:46
.checked_output()
.await;

if let Err(CmdError::Output(o)) = &r {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should wrap the lctl call as a new fn instead of being within it.

Copy link
Member Author

@ip1981 ip1981 Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it first, but the function was private anyway consisting only of Command::new("/usr/sbin/lctl") (after I removed premature decoding of output)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this fn lctl_retry, and keep the original lctl fn in place. the lctl fn is being used in a few other modules already, and it's behavior should stay the same I think.

Copy link
Member Author

@ip1981 ip1981 Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, other modules will just become more sound out of the box. I don't think there is sense in failing for a temporarily unavailable resource.

Initially there were two checks when listing snapshots, but this one (resource temporarily unavailable) seems the best be handled earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we seen this on any other lctl call (and do we know the behavior is the same)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's valid question. As a precaution we indeed can have two version of lctl function. But the lctl executable is using perror() or strerror() consistently. And retrying temporary failures is still needed.

It is really simple: we would loop iff lctl prints "Resource temporarily unavailable" (to stderr + exit code, it could not be a snapshot comment :) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the call into IML's liblustreapi library? Might be cleaner than doing a shell command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it lctl_retry

@ip1981
Copy link
Member Author

ip1981 commented Oct 21, 2020

I successfully created few snapshots from two terminals at the same time :)

@jgrund jgrund self-requested a review October 21, 2020 10:18
Signed-off-by: Igor Pashev <pashev.igor@gmail.com>
@jgrund
Copy link
Member

jgrund commented Dec 4, 2020

Replaced by #2364

@jgrund jgrund closed this Dec 4, 2020
@jgrund jgrund deleted the ip1981/2248-be-smart branch December 4, 2020 17:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshots should not disappear from the DB Destruction of snapshot failed b/c of locked ldev.conf
4 participants